{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Selecting subsets\n",
"\n",
"Selecting subsets of data to analyze can give deeper insights.\n",
"\n",
"When dealing with large datasets, selecting subsets of data to analyze can provide more focused and meaningful insights than analyzing the entire dataset.\n",
"\n",
"This approach allows researchers to identify patterns and trends within specific subgroups and gain a deeper understanding of the data. Additionally, selecting subsets of data can help to reduce the amount of noise and irrelevant information in the analysis, making it easier to draw accurate conclusions.\n",
"\n",
"Whether analyzing data for scientific research or business intelligence, selecting the correct subset of data can be a crucial step in unlocking deeper insights and driving successful outcomes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How To"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" latitude | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
" ocean_proximity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" -122.23 | \n",
" 37.88 | \n",
" 41.0 | \n",
" 880.0 | \n",
" 129.0 | \n",
" 322.0 | \n",
" 126.0 | \n",
" 8.3252 | \n",
" 452600.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 1 | \n",
" -122.22 | \n",
" 37.86 | \n",
" 21.0 | \n",
" 7099.0 | \n",
" 1106.0 | \n",
" 2401.0 | \n",
" 1138.0 | \n",
" 8.3014 | \n",
" 358500.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 2 | \n",
" -122.24 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1467.0 | \n",
" 190.0 | \n",
" 496.0 | \n",
" 177.0 | \n",
" 7.2574 | \n",
" 352100.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 3 | \n",
" -122.25 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1274.0 | \n",
" 235.0 | \n",
" 558.0 | \n",
" 219.0 | \n",
" 5.6431 | \n",
" 341300.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 4 | \n",
" -122.25 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1627.0 | \n",
" 280.0 | \n",
" 565.0 | \n",
" 259.0 | \n",
" 3.8462 | \n",
" 342200.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" longitude latitude housing_median_age total_rooms total_bedrooms \\\n",
"0 -122.23 37.88 41.0 880.0 129.0 \n",
"1 -122.22 37.86 21.0 7099.0 1106.0 \n",
"2 -122.24 37.85 52.0 1467.0 190.0 \n",
"3 -122.25 37.85 52.0 1274.0 235.0 \n",
"4 -122.25 37.85 52.0 1627.0 280.0 \n",
"\n",
" population households median_income median_house_value ocean_proximity \n",
"0 322.0 126.0 8.3252 452600.0 NEAR BAY \n",
"1 2401.0 1138.0 8.3014 358500.0 NEAR BAY \n",
"2 496.0 177.0 7.2574 352100.0 NEAR BAY \n",
"3 558.0 219.0 5.6431 341300.0 NEAR BAY \n",
"4 565.0 259.0 3.8462 342200.0 NEAR BAY "
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"data/housing.csv\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"4 True\n",
" ... \n",
"20635 False\n",
"20636 False\n",
"20637 False\n",
"20638 False\n",
"20639 False\n",
"Name: longitude, Length: 20640, dtype: bool"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.longitude < -122"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" latitude | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
" ocean_proximity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" -122.23 | \n",
" 37.88 | \n",
" 41.0 | \n",
" 880.0 | \n",
" 129.0 | \n",
" 322.0 | \n",
" 126.0 | \n",
" 8.3252 | \n",
" 452600.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 1 | \n",
" -122.22 | \n",
" 37.86 | \n",
" 21.0 | \n",
" 7099.0 | \n",
" 1106.0 | \n",
" 2401.0 | \n",
" 1138.0 | \n",
" 8.3014 | \n",
" 358500.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 2 | \n",
" -122.24 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1467.0 | \n",
" 190.0 | \n",
" 496.0 | \n",
" 177.0 | \n",
" 7.2574 | \n",
" 352100.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 3 | \n",
" -122.25 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1274.0 | \n",
" 235.0 | \n",
" 558.0 | \n",
" 219.0 | \n",
" 5.6431 | \n",
" 341300.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 4 | \n",
" -122.25 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1627.0 | \n",
" 280.0 | \n",
" 565.0 | \n",
" 259.0 | \n",
" 3.8462 | \n",
" 342200.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 20573 | \n",
" -122.05 | \n",
" 38.56 | \n",
" 20.0 | \n",
" 1005.0 | \n",
" 168.0 | \n",
" 457.0 | \n",
" 157.0 | \n",
" 5.6790 | \n",
" 225000.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20581 | \n",
" -122.21 | \n",
" 38.83 | \n",
" 20.0 | \n",
" 1138.0 | \n",
" 221.0 | \n",
" 459.0 | \n",
" 209.0 | \n",
" 3.1534 | \n",
" 123400.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20582 | \n",
" -122.16 | \n",
" 38.90 | \n",
" 33.0 | \n",
" 1221.0 | \n",
" 236.0 | \n",
" 488.0 | \n",
" 199.0 | \n",
" 3.7574 | \n",
" 92700.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20585 | \n",
" -122.04 | \n",
" 38.68 | \n",
" 26.0 | \n",
" 1113.0 | \n",
" 222.0 | \n",
" 689.0 | \n",
" 234.0 | \n",
" 3.0486 | \n",
" 83600.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20586 | \n",
" -122.03 | \n",
" 38.69 | \n",
" 23.0 | \n",
" 1796.0 | \n",
" 380.0 | \n",
" 939.0 | \n",
" 330.0 | \n",
" 2.7955 | \n",
" 96300.0 | \n",
" INLAND | \n",
"
\n",
" \n",
"
\n",
"
3967 rows × 10 columns
\n",
"
"
],
"text/plain": [
" longitude latitude housing_median_age total_rooms total_bedrooms \\\n",
"0 -122.23 37.88 41.0 880.0 129.0 \n",
"1 -122.22 37.86 21.0 7099.0 1106.0 \n",
"2 -122.24 37.85 52.0 1467.0 190.0 \n",
"3 -122.25 37.85 52.0 1274.0 235.0 \n",
"4 -122.25 37.85 52.0 1627.0 280.0 \n",
"... ... ... ... ... ... \n",
"20573 -122.05 38.56 20.0 1005.0 168.0 \n",
"20581 -122.21 38.83 20.0 1138.0 221.0 \n",
"20582 -122.16 38.90 33.0 1221.0 236.0 \n",
"20585 -122.04 38.68 26.0 1113.0 222.0 \n",
"20586 -122.03 38.69 23.0 1796.0 380.0 \n",
"\n",
" population households median_income median_house_value \\\n",
"0 322.0 126.0 8.3252 452600.0 \n",
"1 2401.0 1138.0 8.3014 358500.0 \n",
"2 496.0 177.0 7.2574 352100.0 \n",
"3 558.0 219.0 5.6431 341300.0 \n",
"4 565.0 259.0 3.8462 342200.0 \n",
"... ... ... ... ... \n",
"20573 457.0 157.0 5.6790 225000.0 \n",
"20581 459.0 209.0 3.1534 123400.0 \n",
"20582 488.0 199.0 3.7574 92700.0 \n",
"20585 689.0 234.0 3.0486 83600.0 \n",
"20586 939.0 330.0 2.7955 96300.0 \n",
"\n",
" ocean_proximity \n",
"0 NEAR BAY \n",
"1 NEAR BAY \n",
"2 NEAR BAY \n",
"3 NEAR BAY \n",
"4 NEAR BAY \n",
"... ... \n",
"20573 INLAND \n",
"20581 INLAND \n",
"20582 INLAND \n",
"20585 INLAND \n",
"20586 INLAND \n",
"\n",
"[3967 rows x 10 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df.longitude < -122]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(20640, 10)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" latitude | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
" ocean_proximity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" -122.23 | \n",
" 37.88 | \n",
" 41.0 | \n",
" 880.0 | \n",
" 129.0 | \n",
" 322.0 | \n",
" 126.0 | \n",
" 8.3252 | \n",
" 452600.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 1 | \n",
" -122.22 | \n",
" 37.86 | \n",
" 21.0 | \n",
" 7099.0 | \n",
" 1106.0 | \n",
" 2401.0 | \n",
" 1138.0 | \n",
" 8.3014 | \n",
" 358500.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 2 | \n",
" -122.24 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1467.0 | \n",
" 190.0 | \n",
" 496.0 | \n",
" 177.0 | \n",
" 7.2574 | \n",
" 352100.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 3 | \n",
" -122.25 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1274.0 | \n",
" 235.0 | \n",
" 558.0 | \n",
" 219.0 | \n",
" 5.6431 | \n",
" 341300.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 4 | \n",
" -122.25 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1627.0 | \n",
" 280.0 | \n",
" 565.0 | \n",
" 259.0 | \n",
" 3.8462 | \n",
" 342200.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 20573 | \n",
" -122.05 | \n",
" 38.56 | \n",
" 20.0 | \n",
" 1005.0 | \n",
" 168.0 | \n",
" 457.0 | \n",
" 157.0 | \n",
" 5.6790 | \n",
" 225000.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20581 | \n",
" -122.21 | \n",
" 38.83 | \n",
" 20.0 | \n",
" 1138.0 | \n",
" 221.0 | \n",
" 459.0 | \n",
" 209.0 | \n",
" 3.1534 | \n",
" 123400.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20582 | \n",
" -122.16 | \n",
" 38.90 | \n",
" 33.0 | \n",
" 1221.0 | \n",
" 236.0 | \n",
" 488.0 | \n",
" 199.0 | \n",
" 3.7574 | \n",
" 92700.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20585 | \n",
" -122.04 | \n",
" 38.68 | \n",
" 26.0 | \n",
" 1113.0 | \n",
" 222.0 | \n",
" 689.0 | \n",
" 234.0 | \n",
" 3.0486 | \n",
" 83600.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20586 | \n",
" -122.03 | \n",
" 38.69 | \n",
" 23.0 | \n",
" 1796.0 | \n",
" 380.0 | \n",
" 939.0 | \n",
" 330.0 | \n",
" 2.7955 | \n",
" 96300.0 | \n",
" INLAND | \n",
"
\n",
" \n",
"
\n",
"
2718 rows × 10 columns
\n",
"
"
],
"text/plain": [
" longitude latitude housing_median_age total_rooms total_bedrooms \\\n",
"0 -122.23 37.88 41.0 880.0 129.0 \n",
"1 -122.22 37.86 21.0 7099.0 1106.0 \n",
"2 -122.24 37.85 52.0 1467.0 190.0 \n",
"3 -122.25 37.85 52.0 1274.0 235.0 \n",
"4 -122.25 37.85 52.0 1627.0 280.0 \n",
"... ... ... ... ... ... \n",
"20573 -122.05 38.56 20.0 1005.0 168.0 \n",
"20581 -122.21 38.83 20.0 1138.0 221.0 \n",
"20582 -122.16 38.90 33.0 1221.0 236.0 \n",
"20585 -122.04 38.68 26.0 1113.0 222.0 \n",
"20586 -122.03 38.69 23.0 1796.0 380.0 \n",
"\n",
" population households median_income median_house_value \\\n",
"0 322.0 126.0 8.3252 452600.0 \n",
"1 2401.0 1138.0 8.3014 358500.0 \n",
"2 496.0 177.0 7.2574 352100.0 \n",
"3 558.0 219.0 5.6431 341300.0 \n",
"4 565.0 259.0 3.8462 342200.0 \n",
"... ... ... ... ... \n",
"20573 457.0 157.0 5.6790 225000.0 \n",
"20581 459.0 209.0 3.1534 123400.0 \n",
"20582 488.0 199.0 3.7574 92700.0 \n",
"20585 689.0 234.0 3.0486 83600.0 \n",
"20586 939.0 330.0 2.7955 96300.0 \n",
"\n",
" ocean_proximity \n",
"0 NEAR BAY \n",
"1 NEAR BAY \n",
"2 NEAR BAY \n",
"3 NEAR BAY \n",
"4 NEAR BAY \n",
"... ... \n",
"20573 INLAND \n",
"20581 INLAND \n",
"20582 INLAND \n",
"20585 INLAND \n",
"20586 INLAND \n",
"\n",
"[2718 rows x 10 columns]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df.ocean_proximity.isin([\"NEAR BAY\", \"INLAND\"]) & (df.longitude < -122)]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['NEAR BAY', 'INLAND', 'NEAR OCEAN', '<1H OCEAN'], dtype=object)"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_subset = df[df.ocean_proximity.isin([\"NEAR BAY\", \"INLAND\"]) | (df.longitude < -122)]\n",
"df_subset.ocean_proximity.unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## .loc"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" latitude | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" -122.23 | \n",
" 37.88 | \n",
"
\n",
" \n",
" 1 | \n",
" -122.22 | \n",
" 37.86 | \n",
"
\n",
" \n",
" 2 | \n",
" -122.24 | \n",
" 37.85 | \n",
"
\n",
" \n",
" 3 | \n",
" -122.25 | \n",
" 37.85 | \n",
"
\n",
" \n",
" 4 | \n",
" -122.25 | \n",
" 37.85 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 20635 | \n",
" -121.09 | \n",
" 39.48 | \n",
"
\n",
" \n",
" 20636 | \n",
" -121.21 | \n",
" 39.49 | \n",
"
\n",
" \n",
" 20637 | \n",
" -121.22 | \n",
" 39.43 | \n",
"
\n",
" \n",
" 20638 | \n",
" -121.32 | \n",
" 39.43 | \n",
"
\n",
" \n",
" 20639 | \n",
" -121.24 | \n",
" 39.37 | \n",
"
\n",
" \n",
"
\n",
"
20640 rows × 2 columns
\n",
"
"
],
"text/plain": [
" longitude latitude\n",
"0 -122.23 37.88\n",
"1 -122.22 37.86\n",
"2 -122.24 37.85\n",
"3 -122.25 37.85\n",
"4 -122.25 37.85\n",
"... ... ...\n",
"20635 -121.09 39.48\n",
"20636 -121.21 39.49\n",
"20637 -121.22 39.43\n",
"20638 -121.32 39.43\n",
"20639 -121.24 39.37\n",
"\n",
"[20640 rows x 2 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[:, [\"longitude\", \"latitude\"]]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" latitude | \n",
"
\n",
" \n",
" \n",
" \n",
" 5 | \n",
" -122.25 | \n",
" 37.85 | \n",
"
\n",
" \n",
" 6 | \n",
" -122.25 | \n",
" 37.84 | \n",
"
\n",
" \n",
" 7 | \n",
" -122.25 | \n",
" 37.84 | \n",
"
\n",
" \n",
" 8 | \n",
" -122.26 | \n",
" 37.84 | \n",
"
\n",
" \n",
" 9 | \n",
" -122.25 | \n",
" 37.84 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 496 | \n",
" -122.26 | \n",
" 37.85 | \n",
"
\n",
" \n",
" 497 | \n",
" -122.27 | \n",
" 37.85 | \n",
"
\n",
" \n",
" 498 | \n",
" -122.27 | \n",
" 37.85 | \n",
"
\n",
" \n",
" 499 | \n",
" -122.27 | \n",
" 37.85 | \n",
"
\n",
" \n",
" 500 | \n",
" -122.27 | \n",
" 37.85 | \n",
"
\n",
" \n",
"
\n",
"
496 rows × 2 columns
\n",
"
"
],
"text/plain": [
" longitude latitude\n",
"5 -122.25 37.85\n",
"6 -122.25 37.84\n",
"7 -122.25 37.84\n",
"8 -122.26 37.84\n",
"9 -122.25 37.84\n",
".. ... ...\n",
"496 -122.26 37.85\n",
"497 -122.27 37.85\n",
"498 -122.27 37.85\n",
"499 -122.27 37.85\n",
"500 -122.27 37.85\n",
"\n",
"[496 rows x 2 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[5:500, [\"longitude\", \"latitude\"]]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" latitude | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
" ocean_proximity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" -122.23 | \n",
" 37.88 | \n",
" 41.0 | \n",
" 880.0 | \n",
" 129.0 | \n",
" 322.0 | \n",
" 126.0 | \n",
" 8.3252 | \n",
" 452600.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 1 | \n",
" -122.22 | \n",
" 37.86 | \n",
" 21.0 | \n",
" 7099.0 | \n",
" 1106.0 | \n",
" 2401.0 | \n",
" 1138.0 | \n",
" 8.3014 | \n",
" 358500.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 2 | \n",
" -122.24 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1467.0 | \n",
" 190.0 | \n",
" 496.0 | \n",
" 177.0 | \n",
" 7.2574 | \n",
" 352100.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 3 | \n",
" -122.25 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1274.0 | \n",
" 235.0 | \n",
" 558.0 | \n",
" 219.0 | \n",
" 5.6431 | \n",
" 341300.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 4 | \n",
" -122.25 | \n",
" 37.85 | \n",
" 52.0 | \n",
" 1627.0 | \n",
" 280.0 | \n",
" 565.0 | \n",
" 259.0 | \n",
" 3.8462 | \n",
" 342200.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 20635 | \n",
" -121.09 | \n",
" 39.48 | \n",
" 25.0 | \n",
" 1665.0 | \n",
" 374.0 | \n",
" 845.0 | \n",
" 330.0 | \n",
" 1.5603 | \n",
" 78100.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20636 | \n",
" -121.21 | \n",
" 39.49 | \n",
" 18.0 | \n",
" 697.0 | \n",
" 150.0 | \n",
" 356.0 | \n",
" 114.0 | \n",
" 2.5568 | \n",
" 77100.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20637 | \n",
" -121.22 | \n",
" 39.43 | \n",
" 17.0 | \n",
" 2254.0 | \n",
" 485.0 | \n",
" 1007.0 | \n",
" 433.0 | \n",
" 1.7000 | \n",
" 92300.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20638 | \n",
" -121.32 | \n",
" 39.43 | \n",
" 18.0 | \n",
" 1860.0 | \n",
" 409.0 | \n",
" 741.0 | \n",
" 349.0 | \n",
" 1.8672 | \n",
" 84700.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 20639 | \n",
" -121.24 | \n",
" 39.37 | \n",
" 16.0 | \n",
" 2785.0 | \n",
" 616.0 | \n",
" 1387.0 | \n",
" 530.0 | \n",
" 2.3886 | \n",
" 89400.0 | \n",
" INLAND | \n",
"
\n",
" \n",
"
\n",
"
20640 rows × 10 columns
\n",
"
"
],
"text/plain": [
" longitude latitude housing_median_age total_rooms total_bedrooms \\\n",
"0 -122.23 37.88 41.0 880.0 129.0 \n",
"1 -122.22 37.86 21.0 7099.0 1106.0 \n",
"2 -122.24 37.85 52.0 1467.0 190.0 \n",
"3 -122.25 37.85 52.0 1274.0 235.0 \n",
"4 -122.25 37.85 52.0 1627.0 280.0 \n",
"... ... ... ... ... ... \n",
"20635 -121.09 39.48 25.0 1665.0 374.0 \n",
"20636 -121.21 39.49 18.0 697.0 150.0 \n",
"20637 -121.22 39.43 17.0 2254.0 485.0 \n",
"20638 -121.32 39.43 18.0 1860.0 409.0 \n",
"20639 -121.24 39.37 16.0 2785.0 616.0 \n",
"\n",
" population households median_income median_house_value \\\n",
"0 322.0 126.0 8.3252 452600.0 \n",
"1 2401.0 1138.0 8.3014 358500.0 \n",
"2 496.0 177.0 7.2574 352100.0 \n",
"3 558.0 219.0 5.6431 341300.0 \n",
"4 565.0 259.0 3.8462 342200.0 \n",
"... ... ... ... ... \n",
"20635 845.0 330.0 1.5603 78100.0 \n",
"20636 356.0 114.0 2.5568 77100.0 \n",
"20637 1007.0 433.0 1.7000 92300.0 \n",
"20638 741.0 349.0 1.8672 84700.0 \n",
"20639 1387.0 530.0 2.3886 89400.0 \n",
"\n",
" ocean_proximity \n",
"0 NEAR BAY \n",
"1 NEAR BAY \n",
"2 NEAR BAY \n",
"3 NEAR BAY \n",
"4 NEAR BAY \n",
"... ... \n",
"20635 INLAND \n",
"20636 INLAND \n",
"20637 INLAND \n",
"20638 INLAND \n",
"20639 INLAND \n",
"\n",
"[20640 rows x 10 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Indexing"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"df = df.set_index(\"latitude\")"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Float64Index([37.88, 37.86, 37.85, 37.85, 37.85, 37.85, 37.84, 37.84, 37.84,\n",
" 37.84,\n",
" ...\n",
" 39.29, 39.33, 39.26, 39.19, 39.27, 39.48, 39.49, 39.43, 39.43,\n",
" 39.37],\n",
" dtype='float64', name='latitude', length=20640)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
" ocean_proximity | \n",
"
\n",
" \n",
" latitude | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 37.85 | \n",
" -122.24 | \n",
" 52.0 | \n",
" 1467.0 | \n",
" 190.0 | \n",
" 496.0 | \n",
" 177.0 | \n",
" 7.2574 | \n",
" 352100.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.85 | \n",
" -122.25 | \n",
" 52.0 | \n",
" 1274.0 | \n",
" 235.0 | \n",
" 558.0 | \n",
" 219.0 | \n",
" 5.6431 | \n",
" 341300.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.85 | \n",
" -122.25 | \n",
" 52.0 | \n",
" 1627.0 | \n",
" 280.0 | \n",
" 565.0 | \n",
" 259.0 | \n",
" 3.8462 | \n",
" 342200.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.85 | \n",
" -122.25 | \n",
" 52.0 | \n",
" 919.0 | \n",
" 213.0 | \n",
" 413.0 | \n",
" 193.0 | \n",
" 4.0368 | \n",
" 269700.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.85 | \n",
" -122.26 | \n",
" 52.0 | \n",
" 2202.0 | \n",
" 434.0 | \n",
" 910.0 | \n",
" 402.0 | \n",
" 3.2031 | \n",
" 281500.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" longitude housing_median_age total_rooms total_bedrooms \\\n",
"latitude \n",
"37.85 -122.24 52.0 1467.0 190.0 \n",
"37.85 -122.25 52.0 1274.0 235.0 \n",
"37.85 -122.25 52.0 1627.0 280.0 \n",
"37.85 -122.25 52.0 919.0 213.0 \n",
"37.85 -122.26 52.0 2202.0 434.0 \n",
"\n",
" population households median_income median_house_value \\\n",
"latitude \n",
"37.85 496.0 177.0 7.2574 352100.0 \n",
"37.85 558.0 219.0 5.6431 341300.0 \n",
"37.85 565.0 259.0 3.8462 342200.0 \n",
"37.85 413.0 193.0 4.0368 269700.0 \n",
"37.85 910.0 402.0 3.2031 281500.0 \n",
"\n",
" ocean_proximity \n",
"latitude \n",
"37.85 NEAR BAY \n",
"37.85 NEAR BAY \n",
"37.85 NEAR BAY \n",
"37.85 NEAR BAY \n",
"37.85 NEAR BAY "
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[37.85, :].head()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" latitude | \n",
" longitude | \n",
" population | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 37.88 | \n",
" -122.23 | \n",
" 322.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 37.86 | \n",
" -122.22 | \n",
" 2401.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 37.85 | \n",
" -122.24 | \n",
" 496.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 37.85 | \n",
" -122.25 | \n",
" 558.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 37.85 | \n",
" -122.25 | \n",
" 565.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 20635 | \n",
" 39.48 | \n",
" -121.09 | \n",
" 845.0 | \n",
"
\n",
" \n",
" 20636 | \n",
" 39.49 | \n",
" -121.21 | \n",
" 356.0 | \n",
"
\n",
" \n",
" 20637 | \n",
" 39.43 | \n",
" -121.22 | \n",
" 1007.0 | \n",
"
\n",
" \n",
" 20638 | \n",
" 39.43 | \n",
" -121.32 | \n",
" 741.0 | \n",
"
\n",
" \n",
" 20639 | \n",
" 39.37 | \n",
" -121.24 | \n",
" 1387.0 | \n",
"
\n",
" \n",
"
\n",
"
20640 rows × 3 columns
\n",
"
"
],
"text/plain": [
" latitude longitude population\n",
"0 37.88 -122.23 322.0\n",
"1 37.86 -122.22 2401.0\n",
"2 37.85 -122.24 496.0\n",
"3 37.85 -122.25 558.0\n",
"4 37.85 -122.25 565.0\n",
"... ... ... ...\n",
"20635 39.48 -121.09 845.0\n",
"20636 39.49 -121.21 356.0\n",
"20637 39.43 -121.22 1007.0\n",
"20638 39.43 -121.32 741.0\n",
"20639 39.37 -121.24 1387.0\n",
"\n",
"[20640 rows x 3 columns]"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[[\"longitude\", \"population\"]].reset_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Index Slicing"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
"
\n",
" \n",
" latitude | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 37.85 | \n",
" 213.0 | \n",
" 413.0 | \n",
" 193.0 | \n",
" 4.0368 | \n",
" 269700.0 | \n",
"
\n",
" \n",
" 37.84 | \n",
" 489.0 | \n",
" 1094.0 | \n",
" 514.0 | \n",
" 3.6591 | \n",
" 299200.0 | \n",
"
\n",
" \n",
" 37.84 | \n",
" 687.0 | \n",
" 1157.0 | \n",
" 647.0 | \n",
" 3.1200 | \n",
" 241400.0 | \n",
"
\n",
" \n",
" 37.84 | \n",
" 665.0 | \n",
" 1206.0 | \n",
" 595.0 | \n",
" 2.0804 | \n",
" 226700.0 | \n",
"
\n",
" \n",
" 37.84 | \n",
" 707.0 | \n",
" 1551.0 | \n",
" 714.0 | \n",
" 3.6912 | \n",
" 261100.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 37.86 | \n",
" 663.0 | \n",
" 1316.0 | \n",
" 590.0 | \n",
" 5.3794 | \n",
" 376900.0 | \n",
"
\n",
" \n",
" 37.85 | \n",
" 768.0 | \n",
" 1508.0 | \n",
" 755.0 | \n",
" 3.2619 | \n",
" 309600.0 | \n",
"
\n",
" \n",
" 37.85 | \n",
" 920.0 | \n",
" 1800.0 | \n",
" 815.0 | \n",
" 2.7054 | \n",
" 182300.0 | \n",
"
\n",
" \n",
" 37.85 | \n",
" 400.0 | \n",
" 719.0 | \n",
" 326.0 | \n",
" 2.2431 | \n",
" 172700.0 | \n",
"
\n",
" \n",
" 37.85 | \n",
" 300.0 | \n",
" 675.0 | \n",
" 255.0 | \n",
" 1.9028 | \n",
" 150800.0 | \n",
"
\n",
" \n",
"
\n",
"
495 rows × 5 columns
\n",
"
"
],
"text/plain": [
" total_bedrooms population households median_income \\\n",
"latitude \n",
"37.85 213.0 413.0 193.0 4.0368 \n",
"37.84 489.0 1094.0 514.0 3.6591 \n",
"37.84 687.0 1157.0 647.0 3.1200 \n",
"37.84 665.0 1206.0 595.0 2.0804 \n",
"37.84 707.0 1551.0 714.0 3.6912 \n",
"... ... ... ... ... \n",
"37.86 663.0 1316.0 590.0 5.3794 \n",
"37.85 768.0 1508.0 755.0 3.2619 \n",
"37.85 920.0 1800.0 815.0 2.7054 \n",
"37.85 400.0 719.0 326.0 2.2431 \n",
"37.85 300.0 675.0 255.0 1.9028 \n",
"\n",
" median_house_value \n",
"latitude \n",
"37.85 269700.0 \n",
"37.84 299200.0 \n",
"37.84 241400.0 \n",
"37.84 226700.0 \n",
"37.84 261100.0 \n",
"... ... \n",
"37.86 376900.0 \n",
"37.85 309600.0 \n",
"37.85 182300.0 \n",
"37.85 172700.0 \n",
"37.85 150800.0 \n",
"\n",
"[495 rows x 5 columns]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[5:500, 3:8]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting Columns and Indices"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" population | \n",
"
\n",
" \n",
" latitude | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 37.88 | \n",
" -122.23 | \n",
" 322.0 | \n",
"
\n",
" \n",
" 37.86 | \n",
" -122.22 | \n",
" 2401.0 | \n",
"
\n",
" \n",
" 37.85 | \n",
" -122.24 | \n",
" 496.0 | \n",
"
\n",
" \n",
" 37.85 | \n",
" -122.25 | \n",
" 558.0 | \n",
"
\n",
" \n",
" 37.85 | \n",
" -122.25 | \n",
" 565.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 39.48 | \n",
" -121.09 | \n",
" 845.0 | \n",
"
\n",
" \n",
" 39.49 | \n",
" -121.21 | \n",
" 356.0 | \n",
"
\n",
" \n",
" 39.43 | \n",
" -121.22 | \n",
" 1007.0 | \n",
"
\n",
" \n",
" 39.43 | \n",
" -121.32 | \n",
" 741.0 | \n",
"
\n",
" \n",
" 39.37 | \n",
" -121.24 | \n",
" 1387.0 | \n",
"
\n",
" \n",
"
\n",
"
20640 rows × 2 columns
\n",
"
"
],
"text/plain": [
" longitude population\n",
"latitude \n",
"37.88 -122.23 322.0\n",
"37.86 -122.22 2401.0\n",
"37.85 -122.24 496.0\n",
"37.85 -122.25 558.0\n",
"37.85 -122.25 565.0\n",
"... ... ...\n",
"39.48 -121.09 845.0\n",
"39.49 -121.21 356.0\n",
"39.43 -121.22 1007.0\n",
"39.43 -121.32 741.0\n",
"39.37 -121.24 1387.0\n",
"\n",
"[20640 rows x 2 columns]"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[[\"longitude\", \"population\"]]"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
" ocean_proximity | \n",
"
\n",
" \n",
" latitude | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 37.88 | \n",
" 41.0 | \n",
" 880.0 | \n",
" 129.0 | \n",
" 126.0 | \n",
" 8.3252 | \n",
" 452600.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.86 | \n",
" 21.0 | \n",
" 7099.0 | \n",
" 1106.0 | \n",
" 1138.0 | \n",
" 8.3014 | \n",
" 358500.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.85 | \n",
" 52.0 | \n",
" 1467.0 | \n",
" 190.0 | \n",
" 177.0 | \n",
" 7.2574 | \n",
" 352100.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.85 | \n",
" 52.0 | \n",
" 1274.0 | \n",
" 235.0 | \n",
" 219.0 | \n",
" 5.6431 | \n",
" 341300.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.85 | \n",
" 52.0 | \n",
" 1627.0 | \n",
" 280.0 | \n",
" 259.0 | \n",
" 3.8462 | \n",
" 342200.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 39.48 | \n",
" 25.0 | \n",
" 1665.0 | \n",
" 374.0 | \n",
" 330.0 | \n",
" 1.5603 | \n",
" 78100.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 39.49 | \n",
" 18.0 | \n",
" 697.0 | \n",
" 150.0 | \n",
" 114.0 | \n",
" 2.5568 | \n",
" 77100.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 39.43 | \n",
" 17.0 | \n",
" 2254.0 | \n",
" 485.0 | \n",
" 433.0 | \n",
" 1.7000 | \n",
" 92300.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 39.43 | \n",
" 18.0 | \n",
" 1860.0 | \n",
" 409.0 | \n",
" 349.0 | \n",
" 1.8672 | \n",
" 84700.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 39.37 | \n",
" 16.0 | \n",
" 2785.0 | \n",
" 616.0 | \n",
" 530.0 | \n",
" 2.3886 | \n",
" 89400.0 | \n",
" INLAND | \n",
"
\n",
" \n",
"
\n",
"
20640 rows × 7 columns
\n",
"
"
],
"text/plain": [
" housing_median_age total_rooms total_bedrooms households \\\n",
"latitude \n",
"37.88 41.0 880.0 129.0 126.0 \n",
"37.86 21.0 7099.0 1106.0 1138.0 \n",
"37.85 52.0 1467.0 190.0 177.0 \n",
"37.85 52.0 1274.0 235.0 219.0 \n",
"37.85 52.0 1627.0 280.0 259.0 \n",
"... ... ... ... ... \n",
"39.48 25.0 1665.0 374.0 330.0 \n",
"39.49 18.0 697.0 150.0 114.0 \n",
"39.43 17.0 2254.0 485.0 433.0 \n",
"39.43 18.0 1860.0 409.0 349.0 \n",
"39.37 16.0 2785.0 616.0 530.0 \n",
"\n",
" median_income median_house_value ocean_proximity \n",
"latitude \n",
"37.88 8.3252 452600.0 NEAR BAY \n",
"37.86 8.3014 358500.0 NEAR BAY \n",
"37.85 7.2574 352100.0 NEAR BAY \n",
"37.85 5.6431 341300.0 NEAR BAY \n",
"37.85 3.8462 342200.0 NEAR BAY \n",
"... ... ... ... \n",
"39.48 1.5603 78100.0 INLAND \n",
"39.49 2.5568 77100.0 INLAND \n",
"39.43 1.7000 92300.0 INLAND \n",
"39.43 1.8672 84700.0 INLAND \n",
"39.37 2.3886 89400.0 INLAND \n",
"\n",
"[20640 rows x 7 columns]"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop([\"longitude\", \"population\"], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
" ocean_proximity | \n",
"
\n",
" \n",
" latitude | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 37.88 | \n",
" -122.23 | \n",
" 41.0 | \n",
" 880.0 | \n",
" 129.0 | \n",
" 322.0 | \n",
" 126.0 | \n",
" 8.3252 | \n",
" 452600.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.86 | \n",
" -122.22 | \n",
" 21.0 | \n",
" 7099.0 | \n",
" 1106.0 | \n",
" 2401.0 | \n",
" 1138.0 | \n",
" 8.3014 | \n",
" 358500.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.84 | \n",
" -122.25 | \n",
" 52.0 | \n",
" 2535.0 | \n",
" 489.0 | \n",
" 1094.0 | \n",
" 514.0 | \n",
" 3.6591 | \n",
" 299200.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.84 | \n",
" -122.25 | \n",
" 52.0 | \n",
" 3104.0 | \n",
" 687.0 | \n",
" 1157.0 | \n",
" 647.0 | \n",
" 3.1200 | \n",
" 241400.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" 37.84 | \n",
" -122.26 | \n",
" 42.0 | \n",
" 2555.0 | \n",
" 665.0 | \n",
" 1206.0 | \n",
" 595.0 | \n",
" 2.0804 | \n",
" 226700.0 | \n",
" NEAR BAY | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 39.48 | \n",
" -121.09 | \n",
" 25.0 | \n",
" 1665.0 | \n",
" 374.0 | \n",
" 845.0 | \n",
" 330.0 | \n",
" 1.5603 | \n",
" 78100.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 39.49 | \n",
" -121.21 | \n",
" 18.0 | \n",
" 697.0 | \n",
" 150.0 | \n",
" 356.0 | \n",
" 114.0 | \n",
" 2.5568 | \n",
" 77100.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 39.43 | \n",
" -121.22 | \n",
" 17.0 | \n",
" 2254.0 | \n",
" 485.0 | \n",
" 1007.0 | \n",
" 433.0 | \n",
" 1.7000 | \n",
" 92300.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 39.43 | \n",
" -121.32 | \n",
" 18.0 | \n",
" 1860.0 | \n",
" 409.0 | \n",
" 741.0 | \n",
" 349.0 | \n",
" 1.8672 | \n",
" 84700.0 | \n",
" INLAND | \n",
"
\n",
" \n",
" 39.37 | \n",
" -121.24 | \n",
" 16.0 | \n",
" 2785.0 | \n",
" 616.0 | \n",
" 1387.0 | \n",
" 530.0 | \n",
" 2.3886 | \n",
" 89400.0 | \n",
" INLAND | \n",
"
\n",
" \n",
"
\n",
"
20595 rows × 9 columns
\n",
"
"
],
"text/plain": [
" longitude housing_median_age total_rooms total_bedrooms \\\n",
"latitude \n",
"37.88 -122.23 41.0 880.0 129.0 \n",
"37.86 -122.22 21.0 7099.0 1106.0 \n",
"37.84 -122.25 52.0 2535.0 489.0 \n",
"37.84 -122.25 52.0 3104.0 687.0 \n",
"37.84 -122.26 42.0 2555.0 665.0 \n",
"... ... ... ... ... \n",
"39.48 -121.09 25.0 1665.0 374.0 \n",
"39.49 -121.21 18.0 697.0 150.0 \n",
"39.43 -121.22 17.0 2254.0 485.0 \n",
"39.43 -121.32 18.0 1860.0 409.0 \n",
"39.37 -121.24 16.0 2785.0 616.0 \n",
"\n",
" population households median_income median_house_value \\\n",
"latitude \n",
"37.88 322.0 126.0 8.3252 452600.0 \n",
"37.86 2401.0 1138.0 8.3014 358500.0 \n",
"37.84 1094.0 514.0 3.6591 299200.0 \n",
"37.84 1157.0 647.0 3.1200 241400.0 \n",
"37.84 1206.0 595.0 2.0804 226700.0 \n",
"... ... ... ... ... \n",
"39.48 845.0 330.0 1.5603 78100.0 \n",
"39.49 356.0 114.0 2.5568 77100.0 \n",
"39.43 1007.0 433.0 1.7000 92300.0 \n",
"39.43 741.0 349.0 1.8672 84700.0 \n",
"39.37 1387.0 530.0 2.3886 89400.0 \n",
"\n",
" ocean_proximity \n",
"latitude \n",
"37.88 NEAR BAY \n",
"37.86 NEAR BAY \n",
"37.84 NEAR BAY \n",
"37.84 NEAR BAY \n",
"37.84 NEAR BAY \n",
"... ... \n",
"39.48 INLAND \n",
"39.49 INLAND \n",
"39.43 INLAND \n",
"39.43 INLAND \n",
"39.37 INLAND \n",
"\n",
"[20595 rows x 9 columns]"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop([37.85], axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Why?"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 20640.000000 | \n",
" 20640.000000 | \n",
" 20640.000000 | \n",
" 20433.000000 | \n",
" 20640.000000 | \n",
" 20640.000000 | \n",
" 20640.000000 | \n",
" 20640.000000 | \n",
"
\n",
" \n",
" mean | \n",
" -119.569704 | \n",
" 28.639486 | \n",
" 2635.763081 | \n",
" 537.870553 | \n",
" 1425.476744 | \n",
" 499.539680 | \n",
" 3.870671 | \n",
" 206855.816909 | \n",
"
\n",
" \n",
" std | \n",
" 2.003532 | \n",
" 12.585558 | \n",
" 2181.615252 | \n",
" 421.385070 | \n",
" 1132.462122 | \n",
" 382.329753 | \n",
" 1.899822 | \n",
" 115395.615874 | \n",
"
\n",
" \n",
" min | \n",
" -124.350000 | \n",
" 1.000000 | \n",
" 2.000000 | \n",
" 1.000000 | \n",
" 3.000000 | \n",
" 1.000000 | \n",
" 0.499900 | \n",
" 14999.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" -121.800000 | \n",
" 18.000000 | \n",
" 1447.750000 | \n",
" 296.000000 | \n",
" 787.000000 | \n",
" 280.000000 | \n",
" 2.563400 | \n",
" 119600.000000 | \n",
"
\n",
" \n",
" 50% | \n",
" -118.490000 | \n",
" 29.000000 | \n",
" 2127.000000 | \n",
" 435.000000 | \n",
" 1166.000000 | \n",
" 409.000000 | \n",
" 3.534800 | \n",
" 179700.000000 | \n",
"
\n",
" \n",
" 75% | \n",
" -118.010000 | \n",
" 37.000000 | \n",
" 3148.000000 | \n",
" 647.000000 | \n",
" 1725.000000 | \n",
" 605.000000 | \n",
" 4.743250 | \n",
" 264725.000000 | \n",
"
\n",
" \n",
" max | \n",
" -114.310000 | \n",
" 52.000000 | \n",
" 39320.000000 | \n",
" 6445.000000 | \n",
" 35682.000000 | \n",
" 6082.000000 | \n",
" 15.000100 | \n",
" 500001.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" longitude housing_median_age total_rooms total_bedrooms \\\n",
"count 20640.000000 20640.000000 20640.000000 20433.000000 \n",
"mean -119.569704 28.639486 2635.763081 537.870553 \n",
"std 2.003532 12.585558 2181.615252 421.385070 \n",
"min -124.350000 1.000000 2.000000 1.000000 \n",
"25% -121.800000 18.000000 1447.750000 296.000000 \n",
"50% -118.490000 29.000000 2127.000000 435.000000 \n",
"75% -118.010000 37.000000 3148.000000 647.000000 \n",
"max -114.310000 52.000000 39320.000000 6445.000000 \n",
"\n",
" population households median_income median_house_value \n",
"count 20640.000000 20640.000000 20640.000000 20640.000000 \n",
"mean 1425.476744 499.539680 3.870671 206855.816909 \n",
"std 1132.462122 382.329753 1.899822 115395.615874 \n",
"min 3.000000 1.000000 0.499900 14999.000000 \n",
"25% 787.000000 280.000000 2.563400 119600.000000 \n",
"50% 1166.000000 409.000000 3.534800 179700.000000 \n",
"75% 1725.000000 605.000000 4.743250 264725.000000 \n",
"max 35682.000000 6082.000000 15.000100 500001.000000 "
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" longitude | \n",
" housing_median_age | \n",
" total_rooms | \n",
" total_bedrooms | \n",
" population | \n",
" households | \n",
" median_income | \n",
" median_house_value | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 3967.000000 | \n",
" 3967.000000 | \n",
" 3967.000000 | \n",
" 3927.000000 | \n",
" 3967.000000 | \n",
" 3967.000000 | \n",
" 3967.000000 | \n",
" 3967.000000 | \n",
"
\n",
" \n",
" mean | \n",
" -122.408072 | \n",
" 33.831107 | \n",
" 2461.945551 | \n",
" 497.507003 | \n",
" 1203.207714 | \n",
" 468.308546 | \n",
" 4.103604 | \n",
" 244230.881775 | \n",
"
\n",
" \n",
" std | \n",
" 0.423863 | \n",
" 13.270992 | \n",
" 1710.886064 | \n",
" 341.984939 | \n",
" 842.230630 | \n",
" 325.648578 | \n",
" 2.048455 | \n",
" 127889.888450 | \n",
"
\n",
" \n",
" min | \n",
" -124.350000 | \n",
" 2.000000 | \n",
" 8.000000 | \n",
" 1.000000 | \n",
" 8.000000 | \n",
" 1.000000 | \n",
" 0.499900 | \n",
" 14999.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" -122.470000 | \n",
" 23.000000 | \n",
" 1440.000000 | \n",
" 287.000000 | \n",
" 699.500000 | \n",
" 270.000000 | \n",
" 2.708300 | \n",
" 140900.000000 | \n",
"
\n",
" \n",
" 50% | \n",
" -122.300000 | \n",
" 34.000000 | \n",
" 2104.000000 | \n",
" 417.000000 | \n",
" 1019.000000 | \n",
" 395.000000 | \n",
" 3.729200 | \n",
" 226500.000000 | \n",
"
\n",
" \n",
" 75% | \n",
" -122.160000 | \n",
" 46.000000 | \n",
" 3010.000000 | \n",
" 612.500000 | \n",
" 1479.000000 | \n",
" 575.500000 | \n",
" 5.001450 | \n",
" 331500.000000 | \n",
"
\n",
" \n",
" max | \n",
" -122.010000 | \n",
" 52.000000 | \n",
" 18634.000000 | \n",
" 3226.000000 | \n",
" 8276.000000 | \n",
" 3589.000000 | \n",
" 15.000100 | \n",
" 500001.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" longitude housing_median_age total_rooms total_bedrooms \\\n",
"count 3967.000000 3967.000000 3967.000000 3927.000000 \n",
"mean -122.408072 33.831107 2461.945551 497.507003 \n",
"std 0.423863 13.270992 1710.886064 341.984939 \n",
"min -124.350000 2.000000 8.000000 1.000000 \n",
"25% -122.470000 23.000000 1440.000000 287.000000 \n",
"50% -122.300000 34.000000 2104.000000 417.000000 \n",
"75% -122.160000 46.000000 3010.000000 612.500000 \n",
"max -122.010000 52.000000 18634.000000 3226.000000 \n",
"\n",
" population households median_income median_house_value \n",
"count 3967.000000 3967.000000 3967.000000 3967.000000 \n",
"mean 1203.207714 468.308546 4.103604 244230.881775 \n",
"std 842.230630 325.648578 2.048455 127889.888450 \n",
"min 8.000000 1.000000 0.499900 14999.000000 \n",
"25% 699.500000 270.000000 2.708300 140900.000000 \n",
"50% 1019.000000 395.000000 3.729200 226500.000000 \n",
"75% 1479.000000 575.500000 5.001450 331500.000000 \n",
"max 8276.000000 3589.000000 15.000100 500001.000000 "
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df[\"longitude\"] < -122].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise\n",
"\n",
"Select data from a specific population size."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.loc[...]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional Resources"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Pandas Documentation Subsetting](https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}